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Abstract 

Discretizations of infinite-dimensional variational inequalities lead to 
linear and nonlinear complementarity problems with many degrees of free- 
dom. To solve these problems in a parallel computing environment, we 
propose two active-set methods that solve only one linear system of equa- 
tions per iteration. The linear solver, preconditioner, and matrix struc- 
tures can be chosen by the user for a particular application to achieve 
high parallel performance. The parallel scalability of these methods is 
demonstrated for some discretizations of infinite-dimensional variational 
inequalities. 

1 Introduction 

Achieving high performance for numerical methods in parallel computing envi- 
ronments demands that the user have the ability to customize algorithms, linear 
solvers, and data structures for their particular problems. Closed environments 
or algorithms requiring specific linear algebra constrain the choices available 
to the user, inevitably leading to inefficiency. This paper concerns a flexible, 
open environment, the Toolkit for Advanced Optimization, and two algorithms 
for solving complementarity problems implemented so that the user can exploit 
problem structure. The benefits of this design are significant reductions in solu- 
tion time and good parallel scalability on targeted complementarity problems. 

Complementarity problems arise in various application areas (see, e.g., [221 
I18| ) . In this paper, we are interested particularly in applications arising from 
infinite-dimensional box-constrained variational inequalities where a discretiza- 
tion of the problem corresponds to a large-scale linear or nonlinear complemen- 
tarity problem [151 125| . One example of this type, the journal bearing problem 
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3 , involves determining the pressure distribution of a thin film of lubricant 
between two circular cylinders. This problem is posed as an elliptic partial dif- 
ferential equation with a free boundary. A finite-difference scheme for solving 
instances of this problem has n x n y degrees of freedom, where n x and n y are 
the number of points in the spatial discretizations. The large number of de- 
grees of freedom and structure imposed by the finite-difference scheme implies 
that high performance can be achieved when solving this problem in a parallel 
environment. Other applications that can be posed as infinite-dimensional vari- 
ational inequalities include pricing American options 23 34 , nonlinear obstacle 
problems 30 , and optimal control problems |33| . 

The Toolkit for Advanced Optimization (TAO) provides a flexible environ- 
ment for solving optimization and complementarity problems in parallel [S]. 
Several numerical linear algebra packages have been incorporated into TAO 

01 |2] E| • These packages offer many algorithmic choices so that a user can 
select the most appropriate method for a particular application. Novice users 
can rely on the provided defaults, while more advanced users have the freedom 
to select their problem representation and linear algebra to improve parallel per- 
formance. All of the source code for TAO can be downloaded from 7 , which 
includes our implementations of several algorithms for solving complementarity 
problems. 

Section 2 discusses the design philosophy of and facilities provided by the 
Toolkit for Advanced Optimization. The complementarity algorithms used to 
solve the discretized nonlinear complementarity problems in a parallel comput- 
ing environment are discussed in Section 3. Two methods were implemented 
and tested: a semismooth method ^] E3 and a reduced space method similar 
to those presented in ^] Ej • These algorithms are attractive because they 
solve only a single system of linear equations per iteration. These solves can be 
performed by using preconditioned iterative methods |31| . taking advantage of 
the research lavished on numerical linear algebra for solving partial differential 
equations. Numerical results on the MCPLIB collection 01 complementar- 
ity problems are presented in Section 4 to demonstrate that the methods are 
reasonably robust on general problems. The parallel performance achieved on 
several discretizations of infinite-dimensional variational inequalities shows the 
benefits of customizing the linear algebra: both good parallel performance and 
scalability are achieved. 

2 Design Philosophy 

Traditionally, the numerical methods used in an algorithm implementation to 
solve linear systems of equations are chosen by the developer. Since most float- 
ing point operations are typically performed by the linear system solver, con- 
siderable effort has been expended to select and implement one that is efficient 
and robust for a diverse collection of applications. Efficiency for particular 
problem instances has been sacrificed in exchange for robustness and general 
applicability. However, linear solvers tailored to a particular application may 
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save a significant amount of computation and allow the method to find solu- 
tions to larger problems than previously possible. While developers choose a 
linear solver without any knowledge of the application, many applications have 
structure that can be exploited. The Toolkit for Advanced Optimization (TAO) 
0013 was specifically designed to allow the user to customize the linear solver 
to their application so that they can achieve high performance and parallel 
scalability. 

One linear solver choice is an LU factorization, which can be applied to arbi- 
trary nonsingular systems. Many robust implementations have been developed 
that utilize sparsity while maintaining numerical stability. The complementar- 
ity methods in TAO allow for the use of the LU factorizations from LAPACK 
1 for dense systems and LUSOL for sparse systems. Nonetheless, these 
factorizations do not exploit symmetry, strong monotonicity, or other features 
of the matrix, such as block structure. Furthermore, direct factorizations can 
impose excessive memory requirements even when the given matrix is sparse. 
Access to Cholesky factorizations are provided when a symmetric positive defi- 
nite linear system is solved. In this case, the number of floating-point operations 
can be reduced by a factor of two, and further gains can be achieved by stable 
reorderings of the matrix based solely on the sparsity pattern. 

Aside from direct methods, many iterative techniques can be applied to 
solve linear systems of equations. These techniques typically do not impose 
the memory requirements of direct methods. Two of the most common tech- 
niques are GMRES [22] for general nonsingular matrices and the conjugate 
gradient method |20| for symmetric positive definite matrices. Convergence of 
these methods can be significantly improved by using a preconditioner, such 
as an incomplete factorization, which works well on general problems, or an 
application-specific preconditioner, such as an overlapping additive Schwartz 
method 

In a parallel environment, the cost of message passing magnifies the impor- 
tance of an appropriate selection of the linear system solver and preconditioner. 
Parallel implementations of GMRES, conjugate gradients, and many other iter- 
ative methods, along with scalable preconditioners such as incomplete factoriza- 
tions and additive Schwartz methods, are provided in the PETSc toolkit [SUHJ- 
These linear system solvers use the Message Passing Interface (MPI) for 
communication between processors. 

Providing the user with sufficient flexibility in the choice of linear solver re- 
quires careful selection and implementation of the complementarity algorithms. 
In particular, the interface to the numerical objects, iterative methods and lin- 
ear algebra, must be separated from their implementations. Object-oriented 
techniques permit the use of serial or parallel data structures with minimal 
changes to the interface and allow new implementations of linear system solvers 
and associated linear algebra to be incorporated. TAO was designed to enable 
this flexibility and leverages many of the existing parallel linear algebra tools, 
such as HYP RE PETSc, and Trilinos [TS]. The optimization methods in 
TAO, which include the complementarity methods and solvers to minimize an 
objective function with bounds on the variables, are written to scale on parallel 
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machines and expose many of the algorithmic details, such as the linear solver, 
to the user. This design philosophy assumes that the user has knowledge of an 
application that can be used to improve the parallel performance. 



3 Complementarity Algorithms 

The finite-dimensional nonlinear complementarity problem defined by a given 
function F : 5ft" -> W 1 is to calculate an x* € 5ft™ such that x* > 0, F(x*) > 0, 
and (x* , F(x*)) = 0. The algorithms implemented in this study use the current 
iterate to define an active set, solve a reduced system to calculate a direction, and 
then perform a line search to compute a new iterate that sufficiently decreases 
a merit function. 



3.1 Active-Set Semismooth Method 

The semismooth algorithm reformulates a given complementarity problem as 
a nonsmooth system of equations satisfying a semismooth property by using 
an NCP function. The resulting nonsmooth system of equations is solved with 
Newton's method where an element of the B-subdifferential plays the role of 
the Jacobian matrix in the direction calculation. The active-set semismooth 
method implemented in TAO is based on this idea. 

Mathematically, a function (j> : 5ft 2 — > 5ft is said to be an NCP function if 
4>(a, b) = if and only if a > 0, b > 0, and ab — 0. By defining 



4>{x 1 ,F 1 (x)) 
<p(x 2 ,F 2 (x)) 

<f>(x n , F n (x)) 



for any NCP function <fi, the nonlinear complementarity problem can be refor- 
mulated as finding an x* € 5ft™ such that Q(x*) = 0. In order to globalize a 
Newton method for solving this system of equations, the merit function 

:= \\Mx)\\l 

is typically used in a line search. 

The Fischer-Burmeister function |19j . 



<pFB(a, b) := a + b — \/ a? + 6 2 , 



is one NCP function, where f ™ denotes the reformulation of the nonlinear 
complementarity problem implied by using 4>fb- Even though <&fb is not con- 
tinuously differentiable, a Newton method can still be constructed for finding 
$ FB (x) = 0. 
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The function &fb is differentiable almost everywhere and therefore admits 
a B-subdifferential P§] 

d B <5>FB{x) := \ H e 5R" xrl | 3{x k } C D$ FB with lim V$ FB (a; fc ) = ff 1 , 

where D<$> FB denotes the set of points where <&fb is differentiable. In particular, 

d B ^ FB (x) C {C a (x) +A(x)Vf(x)} 

for nonnegative diagonal matrices D a (x) and Dj,(x) defined componentwise as 
follows HZ|: 

(a) If IKaJi.F^x))^ > 0, then 



|(xi,Fi(x)) 



(b) Otherwise 

([C^yjAWly) 6 {(1- a, 1-/3) | ||(a,/3)|| 2 <l}. 

Furthermore, the merit function, ^fb{x) :— i s continuously dif- 

ferentiable with \/^fb(x) = H t §fb(x) for any H G 3b^fb{x). 

The main computational task in a semismooth Newton method for solv- 
ing the nonsmooth system of equations $fb(x) = is to calculate a direction 
by solving the linear system of equations 

H k d k = -$ FB (x k ), 

where H k is any element of Ob'&fb {x k )- An Armijo line search [2] along this 
direction is then applied to obtain a new iterate that sufficiently decreases the 
merit function. When the full system is solved to find the Newton direction, 
one must use a method suitable for nonsymmetric matrices because of the row 
scaling implied by the characterization of the B-subdifferential. Alternatively, 
one can solve a reduced system, where only a nonnegative diagonal perturbation 
is made to VF(x). This reduced system will be symmetric whenever V-F(x) is 
symmetric, removing the restrictions placed on the linear solver for the full-space 
system. 

The reduced system is obtained by selecting active and inactive sets of con- 
straints. For a fixed < e < 1, define 

A{x) := {i e {i,...,n} [A,0)1m < £ ) 

l(x) := {i E {!,..., n} | [D b (x)}i,i > e} , 
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where A(x) denotes the active constraints at x and X(x) the inactive constraints. 
The characterization of the B-subdifferential can be used to show that 

i G A{x) =>• Xi < K,Fi{x) 

for some n > 0. The latter characterization is used by the active-set method in 
in where they show that for all i in a sufficiently small neighborhood of 
x* , the strongly active components are correctly identified. 

We are now ready to state our active-set semismooth algorithm. 

Algorithm 3.1 Active-Set Semismooth Method 

1. Let F : 5ft™ -> x° G 5ft'\ e G [0, 1), p > 0, p > 2, (3 < 1, and a G (0, \) 
be given. Set k = 0. 

2. If \\<f>(x k )\\ 2 < tol, then stop. 

3. Otherwise, choose D a (x k ) and Dt,(x k ) so that 

D a (x k ) + D b (x k )VF(x k ) G d B $ F B (x k ), 
and determine A k :— A(x k ) and X k := T(x k ). 

4. Let 

d% = -[D a (x k )]^l tAk ^ FB (x k ) A k, 
and approximately solve the reduced system 

{[D b {x k )]-l xk [D a {x k )] Ik ^ + [VF(x k )] Ik ^ d k k = 
-[D b (x k )]- k l <Ik $ F B(x k h* - [VF{x k )} x ^ Ak d k Ak 

to find d k k . 

5. If the descent test 

W FB (x k ) T d k < -p\\d k f 2 
is not satisfied, set d k = — S7 1 & FB (x k ). 

6. Calculate the smallest i G {0, 1, . . .} such that 

V FB (x k + (3 l d k ) < *Fs(x fc ) + af3 k V^ FB (x k fd k , 
and set x k+1 = x k + frdf 1 . 

7. Set k = k + 1, and go to Step 2. 

The advantages of this active-set method are that the linear systems of 
equations solved to calculate the Newton directions are of a reduced size and 
they remain symmetric, positive definite if V ' F{x) is symmetric, positive definite. 
Therefore, the choice of preconditioner and iterative method is not restricted to 
nonsymmetric methods. 
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The implementation of the active-set method in TAO uses p = 10 10 , p = 
2.1, (3 — h, and a = 1CP 4 . The value of e is dynamically chosen as 

min{l||$(x fc )||j,10- 2 } 
^ J l+||VF(^)||a ' 

This choice forces the active-set identification to go to zero as we approach a so- 
lution to the complementarity problem and deals with possible scaling problems 
with the Jacobian of F, since small values for [Dt,]i^ can still have an effect on 
the direction calculation when the Jacobian is poorly scaled. For mixed com- 
plementarity problems where we have both finite lower and upper bounds, the 
Billups formulation is used JU|- The same active-set identification technique is 
used with this formulation. 

3.2 Reduced-Space Method 

The reduced-space method implemented in TAO also selects an active set and 
solves a reduced linear system of equations to calculate a direction. The iterates 
in this method remain within the variable bounds, while no such guarantee is 
made with the active-set semismooth method. This method is motivated by the 
simplicity of implementation in a distributed memory computing environment 
and the computational efficiency of similar methods observed on several classes 
of problems |14| . 

The active and inactive sets used within the reduced-space method are de- 
fined as 

A(x) := {i e {1, ...,n} | Xi = and F t (x) > 0} 
T(x) := {i S {1, . . . ,n} | Xi > or F L (x) < 0} . 

The active set denotes the variables where the lower bound is active and the 
function value can be ignored. The inactive set contains the remainder of the 
variables. At every iteration of the reduced-space method a direction is calcu- 
lated by approximately solving the linear system equation 

[VF(x k )] 7 . k Ik djk — —F I k(x k ) 

and setting d_^k to zero. 

A projected line search is then applied to generate the next iterate x k+1 such 
that 

x fe+1 = n[x k + ad k ], (1) 

where 7r is the projection onto the variable bounds. The step size a is chosen 
such that ||.Fh(7r[a; fe + ctd fc ])|| 2 < (1 — <7a)||i<h(a;)||2, where Fq(x) is defined 
component-wise by 

\Fo(x)} - I Fl{x) if Xl > ° (2) 

[n[Xn ^\mm{F l (x),0} if x, = 0. [) 
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The line search tries step lengths a = ft for (3 € (0, 1) and positive integers j 
such that ft > 7. There are no guarantees the direction calculated is a descent 
direction, so the line search can terminate with cither a new point of sufficient 
improvement or a minimum step length. The implementation uses a — 1CP 4 , 
j3 = 0.5, and a minimum stepsize of 7 = 1CP 12 for the parameter choices. 

If the line search fails to identify such a point, this method performs the 
same line search in the direction d k = —F(x k ). The algorithm continues until 
there is a second failure in the line search or a stationary point has been found. 

The complete reduced-space algorithm follows. 

Algorithm 3.2 Active-Set Reduced Space Method 

1. Let F : 5ft" 5ft" and x° G be given, and set k = 0. 

2. If ||^2(x fc )|| 2 < tol, then stop. 

3. Let 

A(x) := {i e {1, . . . , n} I Xi = and Fi(x) > 0} 
l(x) := {i e {1, . . . ,n} I Xi > or F l (x) < 0} , 

set dj(k = Q, and approximately solve the linear system 

[VF(x k )] IkIk d Ik =-F Ik (x k ) 

to calculate a direction. 
4- If possible, calculate the smallest i G {0, 1, . . . ,i} such that 
\\F n (7T[x k + ftd k })\\ 2 < (1 - aft)\\F Q (x k )\\ 2 . 

5. Otherwise, set d k = —F(x k ), and apply the line search. If no such i is 
found providing sufficient decrease, then stop. 

6. Set x k+1 = ir[x k + ftd k ] and k = k + 1, and go to step 2. 

Although no convergence results have been proven for the reduced-space 
method, it has been demonstrated to be effective, especially on monotone appli- 
cations. The reduced matrix retains symmetry and positive dehniteness when 
they exist in the Jacobian, permitting the use of a symmetric linear solver. 
Furthermore, this method lends itself to a parallel implementation because it 
requires only a few numerical operations other than a linear solver. 

4 Computational Results 

The complementarity methods were implemented in and are distributed with the 
Toolkit for Advanced Optimization. Two series of tests were performed on the 
implementations: one to check the robustness to verify that the methods work 



8 



on a significant number of general problems, and the other to test the parallel 
performance and scalability of the methods on some trial infinite-dimensional 
variational inequalities. The latter tests also validate the merits of our design 
philosophy by showing that the customization of the linear system solver does 
have a significant effect on the overall solution time. For all of these tests, an 
upper limit of 100 linear system solves was placed on the methods. 

4.1 Robustness 

We ran the implemented methods on the complementarity problems contained 
in the MCPLIB collection ^21 to demonstrate robustness on a diverse set of 
problems, including many small complementarity problems and some nonlinear 
obstacle problems and optimal control problems. These computational tests 
were performed on Linux workstation containing a Pentium 4 processor with a 
clock speed of 1.8 GHz and 512 MB of RAM. 

In order to test the robustness, we ran the methods with LUSOL as the 
selected linear system solver. This LU factorization routine is used by several 
optimization and complementarity packages, including MINOS t 2H| and PATH 
|13| . and is known to be robust and efficient in a serial environment. We chose 
to use a direct factorization for this test to eliminate possible problems with the 
choice of preconditioner, iterative method, and termination tolerances for the 
inexact linear system solves. 

Tables Hand |21 report the number of successes and failures for each method 
and model in the test set when using LUSOL. These results indicate that the 
active-set semismooth method, which solves 73.7% of the problems, is more ro- 
bust than the reduced-space method, which solves 65.5% of the problems. The 
overall conclusion to be drawn is that both methods solve a significant frac- 
tion of these test problems. While these robustness results can be improved 
by introducing heuristics into the implementations, we prefer to use the stan- 
dard methods because they are less complicated and work well on the targeted 
infinite-dimensional variational inequalities. 

4.2 Scalability 

To demonstrate the scalability and performance gains that can be made by 
selecting appropriate linear system solvers, we tested the complementarity algo- 
rithms implemented on some discretizations of infinite-dimensional variational 
inequalities. All of the computational tests in this subsection were performed on 
a Linux cluster composed of 350 Pentium Xeon processors with a clock speed of 
2.4 GHz. Each node has a minimum of 1 GB of RAM is connected by a Myrinct 
2000 network. 

One benchmark application is the journal bearing model, a variational prob- 
lem over a two-dimensional region. This problem arises in the determination of 
the pressure distribution in a thin film of lubricant between two circular cylin- 
ders. The infinite-dimensional version of this problem is to find a piecewise 
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Tabic 1: Performance of methods on MCPLIB. 





Active-Set Method 


Reduced-Space Method 


Problem 


Successes 


Failures 


Successes | 


Failures 


ahn 


1 





1 





badfree 


1 





1 





baihaung 


1 





1 





bert_oc 


4 





4 





bertsekas 


6 





3 


3 


billups 





3 





3 


bishop 





1 





1 


bratu 


U 


t 


t 





cammct 





1 





1 


cgereg 


2 


20 


14 


8 


choi 


1 





1 





colvdual 


2 


2 


2 


2 


colvnlp 


6 





6 





cycle 


1 





1 





danny 


6 


2 


6 


2 


degen 


1 





1 





dirkse 


1 


1 


1 


1 


duopoly 





1 





1 


eckstein 


1 








1 


ehl 


2 


10 


6 


6 


electric 





1 





1 


explep 


1 








1 


exros 


1 


4 


1 


4 


ferrralph 


2 





2 





finance 


60 





59 


1 


fixedpt 





2 





2 


force 





2 





2 


freebert 


3 


4 


5 


2 


fried 


5 


5 


6 


4 


friedms 


i 
i 


U 


i 
i 


U 


gafni 


3 





3 





games 


18 


7 





25 


golanmcp 





1 


1 





hanskoop 


7 


3 


5 


5 


hydroc 


2 





2 





jel 


2 





2 





jiangqi 


3 





3 





josephy 


8 





5 


3 


kanzow 


7 





7 





kojshin 


8 





5 


3 


kyh 





4 





4 


lincont 





1 





1 


lstest 





1 





1 


leyffer 


1 





1 
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Table 2: Performance of methods on MCPLIB. 





Active-Set Method 


Reduced-Space Method 


Problem 


Successes 


Failures 


Successes | 


Failures 


mathi 


13 





13 





methan 


1 





1 





mr5mcf 





1 





1 


munson 


3 


1 


3 


1 


nash 


4 





4 





ne-hard 





1 


1 





obstacle 


8 





8 





optcont 


5 





5 





pgvon 





12 


2 


10 


pies 


1 








1 


pizer 


1 


3 


3 


1 


powell 


5 


1 


3 


3 


powellmcp 


6 





6 





poz 


6 





6 





qp 


1 





1 





range 


3 


4 





7 


scarf 


9 


3 


6 


6 


shansim 


1 





1 





shubik 


11 


37 


10 


38 


simple-ex 





1 





1 


simple-red 


1 





1 





spillmcp 





1 





1 


sppe 


2 


1 


3 





sun 


1 





1 





taji 


12 





12 





tiebout 





6 





6 


tinloi 


64 





63 


1 


tinsmall 


63 


1 


63 


1 


titan 


1 


1 


1 


1 


tobin 


4 





4 





tqbilat 


1 


1 


2 





trafelas 





2 





2 


trig 


2 


1 





3 


vonthmcf 





1 





1 


xiaohar 


4 





3 


1 


xu 


35 





5 


30 


Total 


437 


156 


391 


202 
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Figure 1: The journal bearing problem with e = 0.9. 



continuously differentiable function, v : T> i— > R, such that 

< t>, u; g At> > u>;, and u [t^Au — u>;] = 

almost everywhere on T> with v — on 92?, where £> = (0, 2ir) x (0, 26) for some 
constant b > and 

= (1 +ecos£i) 3 , ^(6,6) = esin^i 

with e in (0,1). The eccentricity parameter, s, influences, in particular, the 
difficulty of the problem. Figure ^ shows the solution of the journal bearing 
problem for e = 0.9. The steep gradient in the solution makes this problem a 
difficult benchmark. Other elliptic problems tested include the obstacle, elastic- 
plastic torsion, and combustion models from MINPACK-2; see [3] for a complete 
description of these models. 

Discretization of the journal bearing problem with either finite differences 
or finite elements leads to a standard complementarity problem. The number 
of variables is n = n x n yi where n x and n y are, respectively, the number of grid 
points in each coordinate direction of the domain D. See for |26| a description 
of the finite-element discretization. 

The number of grid points can be very large. In fact, these problems be- 
come so large that the Jacobian matrix cannot even be stored on one machine. 
Direct factorizations exacerbate the computer memory issues because the fill-in 
associated with a direct factorization is significant. For this reason, iterative 
linear solvers are necessary. 

The Jacobian of this problem has a sparse, symmetric, and positive definite 
structure. Considerable improvement can be made by using the conjugate gradi- 
ent method to solve the linear systems with an incomplete factorization used as a 
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preconditioner. Many incomplete factorization preconditioners require nonzeros 
along the diagonal, and the journal bearing problem satisfies this requirement. 

The problem was discretized into 40,000 variables and then solved with three 
different solvers: LUSOL, a conjugate gradient method with an incomplete LU 
preconditioner, and a conjugate gradient method with a block Jacobi precondi- 
tioner such that each block contains an incomplete LU factorization. As Tabled 
shows, use of the conjugate gradient methods with with an appropriate precon- 
ditioner saves a significant amount of time. With the reduced-space method, 
the savings range from 30% on the combustion problem to over 70% on the 
elastic-plastic torsion problem. 



Table 3: Solution times for three linear solvers (sec). 



Problem 


LUSOL 


CG - 1 P 


CG - 2 P 


J Bearing 


278 


136 


99 


EP Torsion 


172 


50 


34.6 


Obstacle 


49 


18.4 


14.3 


Combustion 


29.8 


20.2 


14.1 



To demonstrate the parallel efficiency of the complementarity methods, we 
wrote parallel implementations of the elastic-plastic torsion and obstacle prob- 
lems using the grid management facilities of PETSc, which relies on MPI [2] 
for communications between processors. PETSc provides support for discretiz- 
ing the rectangular region T>, partitioning the surface into multiple regions, and 
assigning each processor to one of these regions. Each processor computes the 
function on its subdomain with respect to the variables in its region. A pre- 
conditioned conjugate gradient method was used to solve the linear systems 
generated by the complementarity algorithms. The preconditioner was a block 
Jacobi preconditioner with incomplete LU factorization. In the reduced-space 
method, the relative tolerance used during the linear solve was 0.01. The results 
are summarized in Tabled when using 1-64 processors. 



The overall efficiency of our implementation is shown in Figure 2(a) Each 



bar indicates the performance of the active-set semismooth method relative to 
its performance on two processors. This number is the ratio of two times the 
execution time using two processors and the number of processors multiplied 
by the time needed to solve the problem using one of those processors. With 
this metric, the overall parallel efficiency active-set semismooth method on the 
obstacle problems using 16 processors was over 83%, and the overall paral- 
lel efficiency using 64 processors was over 60%. The overall efficiency of the 
reduced-space method was similar. 

A second set of tests was performed with the mesh refined according the 
number of processors used to solve the problem. In these tests each processor 
owned 10, 000 variables. The mesh was 100 x 100 when one processor was used 
and 800 x 800 when 64 processors were used. Since the methods require more 
iterations to solve problems with finer meshes, a comparison of running times 
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Table 4: Scalability. 



Problem 


Solver 


mx 


my 






Processors 










1 


2 


4 


8 


16 


32 


64 


Obstacle 


R-S 


800 


800 


2848 


1722 


887 


484 


252 


129 


71 


Obstacle 


A-S 


800 


800 


975 


582 


300 


168 


87 


51 


30 


JBearing 


R-S 


800 


800 


13597 


7786 


7021 


3555 


1861 


930 


544 


JBearing 


A-S 


800 


800 


14025 


7770 


7019 


3540 


1973 


1082 


648 


Eptorsion 


R-S 


800 


800 


835 


548 


287 


152 


112 


58 


41 


Eptorsion 


A-S 


800 


800 


4744 


3118 


1644 


877 


463 


261 


165 


Combustion 


R-S 


800 


800 


332 


228 


122 


63 


31 


15 


8.6 


Combustion 


A-S 


800 


800 


339 


233 


124 


67 


32 


17 


10.2 



would not demonstrate the scalability of the methods. Instead, we used the 
rate of floating-point operations as the measure of efficiency. Figure 2(b) shows 
the average number of floating-point operations performed per second on each 
processor when the reduced space method was used to solve the torsion problem. 
As the overhead of message passing increases, the rate of computation on each 
processor decreases. The problem used over 142 MFlops when one processor 
was used and that rate only decreased to about 124 MFlops when when 64 
processors were used. Our parallel efficiency of both methods by this measure 
often exceeded 80% on sixty four processors. 

linn linn 



8 16 



16 32 64 



(a) Overall Implementation Efficiency 



(b) Floating-Point Efficiency 



Figure 2: Parallel efficiency on the obstacle problem. 
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5 Conclusion 



The complementarity methods implemented within the Toolkit for Advanced 
Optimization are reasonably robust and can effectively solve discretized versions 
of infinite-dimensional variational inequalities. In order to achieve high parallel 
performance, the methods were implemented so that the user has flexibility in 
the choice of data structures, linear algebra, and algorithms. 
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